Software Vault: The Diamond Collection

home *** CD-ROM | disk | FTP | other *** search

/ Software Vault: The Diamond Collection / The Diamond Collection (Software Vault)(Digital Impact).ISO / cdr48 / 386p_200.zip / 386VIDEO.TXT < prev next >

Wrap

Text File | 1994-11-06 | 6KB | 139 lines

Some notes about 386video.... IT'S LIMITED The current 386video module does not exploit the full power of XVD drivers. I coded 386video with a generic interface (the interface won't change in the future releases), but with the underlying code focused for 320x200 256 colors screen modes only. If you want to exploit the full power of XVD drivers you'll have to enhance 386video yourself (sorry i'm too much under pressure to do it now). RAM BUFFERING and the VRAM BOTTLENECK I use RAM BUFFERING to render each frame, first i compose the next graphic image into SYSTEM RAM and then blit it to DISPLAY RAM. The main reason to use RAM BUFFERING is that display ram is usually SLOWER than system ram, and usually display ram has lower i/o badwidth available to the processor. What's more, it is faster to cache system ram than display ram (again the i/o bottleneck). So if you have to access multiple times the display frame you are composing... ... it is better to render it on faster system ram and then copy it once to vram. Another reason to use ram buffering is that if you have only one visible display page you can't use the double display page trick, and when you update the display you have to be the fastest you can be. If you have only to copy the buffered image you are sure to use the fastest update method. On some systems, a big cache and a good bus interface makes vram look as fast as system ram, but your program has to run even on "weak" systems with vram bandwidth bottlenecks. DELTA BLITTING: Plain ram buffering works well if system ram is a lot faster than vram AND you don't have bus bandwidth bottlenecks. Plain ISA bus has a 16bit width (8bit for some cards) and a standard 8Mhz clock, this translates to a 1..2 Mbyte/sec bandwidth when copying from memory to memory, while a plain 386/25 has at least a 4..8 Mbyte/sec available bandwidth when accessing system ram. Some systems support some "speeded up ISA" bus (mine can run ISA at 12Mhz, others support internal buffering and "fast cycling") but even if you run on a 120 MIPS Pentium, with an ISA (8bit or 16bit) card you can't go far. The answer to these bottlenecks is DELTA BLITTING, instead of blitting all the display page, blit only the differences between the previous display frame and the next. Usually there are strong correlations between the image already displayed and the next still into system ram, so it is possible to boost animation speed a lot. The speed of my 'test program' was 23..24 Frames Per Second (FPS) with "simple ram buffering" while it skyrocketed to 56..60 FPS when turning on delta blitting. Of course your mileage may vary, it depends on the programmer to set up the appropriate "delta clipping" methods depending on what kind of animation you perform. Maybe you are thinking "HA! Now there are VL-BUS and PCI i don't need to program for fucking old ISA ...". Well, given the current trend, the VL-BUS and PCI buses you think are fast now are gonna be a bottleneck to a 300Mhz SSPHARK (SuperScalar Processor from Hell with Advanced Risky Killpower ;) ) driving a 4096x4096 24bit color mode on a 100 inches display. HE TOUCHMAP The bitmap you render on system ram contains the image you want on screen, if you want to "blit only the differences" you have to store some information to "remember" from a frame to another what's changed. I call this structure a TOUCHMAP, every time you modify (touch) the bitmap store some info on the touchmap, so when you will have to delta-blit you will use the touchmap to see where are the altered pixels to blit . If you want speed, the "touchmap composition" has to be an algorithm of O(n) computational complexity (linear) and the overhead has to be the least possible. I evaluated various touchmapping methods, here comes the one i choosed: A bit-equivalent mask of the display bitmap where ONE BIT in the TOUCHMAP "marks" FOUR PIXELS (A DWORD) on the BITMAP. The bitmap/touchmap size ratio is around 32/1 (quite good) and the touchbits are packed into DWORDS (so, when you "touch", you use the massive speed of 32bit transferts, instead of slow bit-by-bit things). Using loop unrolling you can pump data to the video card at full speed. Nota bene: The touchmap is an ARRAY OF DWORDS, each bit into a dword is a flag for four consecutive pixels. The touchmap has as many rows as the logical display screen height in pixels and as many BITS as logical_display_screen_width/4 in pixels rounded up to a 32 multiple (so the lenght of a touchmap row can be expressed in dwords). When you manipulate the touchmap ALWAYS USE DWORD ACCESS, this is an absolute need to minimize "touchmap updating" overhead. I've tested various methods, the "dword sized" touchmap is faster than anything else on a 386 class processor animating lots of independent objects. This is due to the 32 to 1 ratio between actual pixel data and touchmap data and to the "always aligned dword" access you can use with this method. To further reduce memory usage i use a self-compiling "loop unroller" this way, instead of checking each bit i check a byte and call the appropriate "unrolled loop" for it. With this method i perform only one compare and call instead of eight compare and branch (this keeps my 386 happy because the less the jumps the more the pipeline stays filled and running) WHY 256 COLORS ONLY The 8bit/pixel modes are the less processor intensive you can find (this means lots of speed), 256 colors are good enough for most games. You can blit 4 dots in a single memory access, mask quickly and implement fast compression/decompression methods if needed. If you think 16 or 24 bit/pixel modes are nicer to look at, you are right but the most common display cards use dynamic ram and this means the higher the video refresh bandwidth and the lower the cpu/blitter bandwidth. What's more, i want things capable to run in 4Mbyte, having bitmaps with two to four times the size of plain 256 color ones is no good. For further info and explanations look into 386video.asm, 386video.inc, driver.txt, xvd.txt, makefile and the XVD driver sources (for example chips450.asm) Ciao, Lorenzo Micheletto knight@maya.dei.unipd.it